# Codebook: Medicaid MCO Procurement Claims Analysis

## claims_extracted.csv

| Variable | Type | Description | Values/Range |
|----------|------|-------------|--------------|
| `claim_id` | string | Unique claim identifier | Format: CLM_XXXXXX |
| `state` | string | US state of document origin | 32 states + DC |
| `document_id` | string | Source document identifier | Format: DOC_XXXXXX |
| `document_type` | string | Document classification | RFP, Proposal, Contract, Scoring, Award, Other |
| `year` | integer | Procurement year | 2017-2024 |
| `claim_text` | string | Verbatim extracted claim text | Free text |
| `pattern_type` | string | Claim classification | metric, improvement, change, target, rate |
| `numeric_value` | float | Extracted numeric value | 0-100+ (percentage points) |
| `has_numeric` | boolean | Whether claim contains extractable number | TRUE/FALSE |
| `measure_reference` | string | Quality measure mentioned (if any) | HEDIS/CAHPS codes or NULL |
| `extraction_confidence` | float | Extraction confidence score | 0.0-1.0 |

### Pattern Type Definitions

| Code | Definition | Example |
|------|------------|---------|
| `metric` | Reference to quality measure | "HEDIS ADV-E measure" |
| `improvement` | Claim of percentage improvement | "15% improvement in..." |
| `change` | Statement of increase/decrease | "reduced by 20%" |
| `target` | Future performance commitment | "achieve 90% by 2025" |
| `rate` | Current performance rate | "current rate of 85%" |

## partnerships_extracted.csv

| Variable | Type | Description | Values/Range |
|----------|------|-------------|--------------|
| `partnership_id` | string | Unique partnership identifier | Format: PTR_XXXXXX |
| `state` | string | US state of document origin | 32 states + DC |
| `document_id` | string | Source document identifier | Format: DOC_XXXXXX |
| `partner_text` | string | Verbatim partnership reference | Free text |
| `partner_type` | string | Partner organization type | CBO, Health System, Technology, Academic, Government |

## document_inventory.csv

| Variable | Type | Description | Values/Range |
|----------|------|-------------|--------------|
| `document_id` | string | Unique document identifier | Format: DOC_XXXXXX |
| `state` | string | US state | 32 states + DC |
| `document_type` | string | Document classification | RFP, Proposal, Contract, Scoring, Award, Other |
| `year` | integer | Procurement year | 2017-2024 |
| `file_name` | string | Original filename | Text |
| `file_size_bytes` | integer | File size | Positive integer |
| `mco_name` | string | MCO name (if identifiable) | Text or NULL |
| `format` | string | File format | PDF, DOCX, ZIP |
| `extracted` | boolean | Whether text extraction succeeded | TRUE/FALSE |

## state_summary.csv

| Variable | Type | Description |
|----------|------|-------------|
| `state` | string | US state |
| `region` | string | Census region (Northeast, Southeast, Midwest, Southwest, West) |
| `documents` | integer | Number of source documents |
| `claims` | integer | Total extracted claims |
| `partnerships` | integer | Total extracted partnerships |
| `mean_claim_value` | float | Mean numeric claim value |
| `median_claim_value` | float | Median numeric claim value |

## hedis_outcomes.csv

| Variable | Type | Description |
|----------|------|-------------|
| `mco_name` | string | MCO organization name |
| `measure_name` | string | HEDIS measure name |
| `measure_year` | integer | Measurement year (2023) |
| `rate` | float | Performance rate (0-100) |
| `denominator` | integer | Eligible population |
| `numerator` | integer | Meeting measure criteria |

## Geographic Coding

| Region | States |
|--------|--------|
| Northeast | DE, MA, NH, RI, DC |
| Southeast | FL, GA, KY, LA, MS, TN, VA, WV |
| Midwest | IL, IN, IA, KS, MI, MN, MO, NE, OH |
| Southwest | AZ, NM, OK, TX |
| West | CA, CO, HI, NV, OR, WA |

## Missing Data Codes

| Code | Meaning |
|------|---------|
| NULL | Not applicable or not extracted |
| -999 | Unable to determine |
| NA | Missing data |
